125 research outputs found

    Using Apache Lucene to Search Vector of Locally Aggregated Descriptors

    Full text link
    Surrogate Text Representation (STR) is a profitable solution to efficient similarity search on metric space using conventional text search engines, such as Apache Lucene. This technique is based on comparing the permutations of some reference objects in place of the original metric distance. However, the Achilles heel of STR approach is the need to reorder the result set of the search according to the metric distance. This forces to use a support database to store the original objects, which requires efficient random I/O on a fast secondary memory (such as flash-based storages). In this paper, we propose to extend the Surrogate Text Representation to specifically address a class of visual metric objects known as Vector of Locally Aggregated Descriptors (VLAD). This approach is based on representing the individual sub-vectors forming the VLAD vector with the STR, providing a finer representation of the vector and enabling us to get rid of the reordering phase. The experiments on a publicly available dataset show that the extended STR outperforms the baseline STR achieving satisfactory performance near to the one obtained with the original VLAD vectors.Comment: In Proceedings of the 11th Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2016) - Volume 4: VISAPP, p. 383-39

    A Content-Addressable Network for Similarity Search in Metric Spaces

    Get PDF
    Because of the ongoing digital data explosion, more advanced search paradigms than the traditional exact match are needed for contentbased retrieval in huge and ever growing collections of data produced in application areas such as multimedia, molecular biology, marketing, computer-aided design and purchasing assistance. As the variety of data types is fast going towards creating a database utilized by people, the computer systems must be able to model human fundamental reasoning paradigms, which are naturally based on similarity. The ability to perceive similarities is crucial for recognition, classification, and learning, and it plays an important role in scientific discovery and creativity. Recently, the mathematical notion of metric space has become a useful abstraction of similarity and many similarity search indexes have been developed. In this thesis, we accept the metric space similarity paradigm and concentrate on the scalability issues. By exploiting computer networks and applying the Peer-to-Peer communication paradigms, we build a structured network of computers able to process similarity queries in parallel. Since no centralized entities are used, such architectures are fully scalable. Specifically, we propose a Peer-to-Peer system for similarity search in metric spaces called Metric Content-Addressable Network (MCAN) which is an extension of the well known Content-Addressable Network (CAN) used for hash lookup. A prototype implementation of MCAN was tested on real-life datasets of image features, protein symbols, and text — observed results are reported. We also compared the performance of MCAN with three other, recently proposed, distributed data structures for similarity search in metric spaces

    Aggregating Local Descriptors for Epigraphs Recognition

    Get PDF
    In this paper, we consider the task of recognizing epigraphs in images such as photos taken using mobile devices. Given a set of 17,155 photos related to 14,560 epigraphs, we used a k-NearestNeighbor approach in order to perform the recognition. The contribution of this work is in evaluating state-of-the-art visual object recognition techniques in this specific context. The experimental results conducted show that Vector of Locally Aggregated Descriptors obtained aggregating SIFT descriptors is the best choice for this task.The Fourth International Conference on Digital Presentation and Preservation of Cultural and Scientific Heritage—DiPP2014 is supported by the Ministry of Education and Science and is under the patronage of UNESCO

    Combining EfficientNet and Vision Transformers for Video Deepfake Detection

    Full text link
    Deepfakes are the result of digital manipulation to obtain credible videos in order to deceive the viewer. This is done through deep learning techniques based on autoencoders or GANs that become more accessible and accurate year after year, resulting in fake videos that are very difficult to distinguish from real ones. Traditionally, CNN networks have been used to perform deepfake detection, with the best results obtained using methods based on EfficientNet B7. In this study, we combine various types of Vision Transformers with a convolutional EfficientNet B0 used as a feature extractor, obtaining comparable results with some very recent methods that use Vision Transformers. Differently from the state-of-the-art approaches, we use neither distillation nor ensemble methods. The best model achieved an AUC of 0.951 and an F1 score of 88.0%, very close to the state-of-the-art on the DeepFake Detection Challenge (DFDC)
    • …
    corecore